A statistical theory of nucleation in the presence of uncharacterised impurities 
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First order phase transitions proceed via nucleation. The rate of nucleation varies exponentially 
with the free-energy barrier to nucleation, and so is highly sensitive to variations in this barrier. In 
practice, very few systems are absolutely pure, there are typically some impurities present which 
are rather poorly characterised. These interact with the nucleus, causing the barrier to vary, and 
so must be taken into account. Here the impurity-nucleus interactions are modelled by random 
variables. The rate then has the same form as the partition function of Derrida's Random Energy 
Model, and as in this model there is a regime in which the behaviour is non-self-averaging. Non- 
self- averaging nucleation is nucleation with a rate that varies significantly from one realisation of 
the random variables to another. In experiment this corresponds to variation in the nucleation rate 
from one sample to another. General analytic expressions are obtained for the crossover from a 
self- averaging to a non-self-averaging rate of nucleation. 



I. INTRODUCTION 



Nucleation has long been known to be very sensitive to 
impurities. Very pure water can be cooled to tens of de- 
grees below freezing, OC at atmospheric pressure, before 
it crystallises, but in practice the water in our freezers 
freezes at only a little below OC jlj. The crystals of ice 
in our freezer presumably nucleate heterogeneously, in 
contact with some unknown impurity in the water. The 
nucleus of water may be only a few water molecules across 
and so is only a nanometer or so across. Thus, even im- 
purities only a nanometer across can interact with the 
nucleus and so greatly reduce the free-energy barrier to 
nucleation. The inrpurity may of course be much larger. 
Often we know little of the impurity that is providing 
a surface where the nucleus of ice can form at a much 
lower free-energy cost than in the bulk. Here, we circum- 
vent the problem that the impurities are typically un- 
characterised, by using a statistical theory. We address 
the question: Under what conditions can chance varia- 
tions from sample to sample in the impurities present, 
cause the nucleation rate to vary significantly from sam- 
ple to sample? That is we develop a theory that links 
an observable, the variability of nucleation rate, with the 
variability of the impurities at microscopic length scales. 

Given the ubiquitous nature of this problem of hetero- 
geneous nucleation occurring on uncharacterised impuri- 
ties, relatively little theoretical work has been done. Kar- 
pov and Oxtoby 0, y| have considered nucleation in the 
presence of random static disorder, and Harrowell and 
Oxtoby |j] looked at the effect of the distribution of time 
scales present in glasses. But this work did not address 
the problem of sample to sample variability, and little 
theoretical work has been done for a number of years. 
Castro and coworkers 5^ 6] studied the process that fol- 
lows nucleation, namely growth. See also Ref. Q- The 
pattern of growth depends on whether nucleation occurs 
continuously throughout the process of phase transfor- 
mation or only at a few sites near the start of the pro- 
cess. We find sample to sample variability occurs when 



one or a few sites have unusually low nucleation barriers 
and so there should be a correlation between the pat- 
tern of growth (and hence the final distribution of grain 
sizes if the new phase forming is crystalline) and sample 
to sample variability in the nucleation rate. Castro and 
coworkers consider only growth, they did not explicitly 
consider nucleation, and they did not consider sample to 
sample variability. 

Just as Karpov and Oxtoby did Q, we will consider 
nucleation in the presence of disorder. We will model the 
system as a nucleus interacting with random disorder, 
i.e., the free energy of the nucleus will contain a part that 
is a random variable. Essentially, faced with a situation 
where we know the free energy barrier to nucleation de- 
pends on its interaction with species unknown, we realise 
that it is not possible to base a theoretical description 
on precise knowledge and make a plausible simple guess. 
Individual interactions are modelled by random variables 
with some mean and standard deviation and the system 
is then characterised just by these two numbers. 

The rate of nucleation at a site is proportional to the 
exponential of minus the free-energy divided by the ther- 
mal energy ksT. See the book of Debenedetti Q or the 
review of Oxtoby p\ or of Kashchiev and van Rosnralen 
for an introduction to nucleation. Thus the rate at 
a particular site is proportional to the Boltznrann factor 
of the nucleus at that site and so a sum over different 
sites with different free-energy barriers has the form of a 
sum over Boltzmann weights. This is of course the form 
of a partition function; a partition function of a system 
where the energies are random variables. Such a system 
is called the Random Energy Model (REM) and was first 
proposed and studied by Derrida ,10.] . He was using it 
as a simple model for a glass. We can take over much 
of the analysis of the REM done by Derrida and apply 
it to our system. Most importantly, at low temperatures 
the REM is not self-averaging: different realisations of 
the disorder give rise to significantly different partition 
functions. In our system the analogue of the partition 
function of the REM is the total rate of nucleation, and 
different realisations correspond to different samples pre- 



FIG. 1: (Colour online) Schematic representation of a nucleus 
represented by a 3 by 3 by 3 cube of dark blue monomers, in 
contact with a flat surface composed of 2 types of monomers: 
light and dark yellow. 



a small number of observations of nucleation. This is 
useful as if the nucleation rate can be estimated for two 
different samples and shown to be different in these two 
samples, the experimental system must be in the non- 
self-averaging regime. The last section is a conclusion. 



II. GENERAL THEORY 




pared in the same conditions. So, we have a regime in 
which the rate is not self-averaging: it differs significantly 
from sample to sample. Note that this is distinct from 
variability in properties such as the time until the first 
nucleus appears. As the crossing of a nucleation barrier 
is a random process the time it takes will always be a ran- 
dom variable, but if there is little or no variability in the 
free-energy barrier the rate itself will self-average and so 
not vary from sample to sample. Having recognised that 
our problem is isomorphic to Derrida's REM we have 
a model for the experimental observation of sample-to- 
sample variability. This model allows us to obtain quan- 
titative relations between the width of the distribution 
of the free-energy barriers to nucleation, the number of 
nucleation sites, and the sample-to-sample variability. 

The next section is a very general study of nucleation 
with a free energy barrier that contains a term that is 
a random variable. The number of nucleation sites Ng 
is fixed, although our theory can be generalised to deal 
with varying amounts of impurity nucleation sites, see 
section Hi Bl Section UTTl is devoted to the study of an ex- 
plicit model of a disordered system: a surface composed 
of two types of monomers that are distributed at random. 
Figure n is a schematic of this model. We show how this 
random distribution of monomers leads to a random term 
in the free energy of a nucleus in contact with the sur- 
face and obtain an explicit expression for the width w 
of the distribution of free-energy barriers. The model of 
Fig. n is just one possible system that results in a ran- 
dom term in the free-energy barrier to nucleation, we can 
envisage many others. Indeed other activated processes 
with the same exponential dependence on the height of a 
free-energy barrier, such as protein unfolding 11], have 
essentially the same behaviour in the presence of disor- 
der. Disorder can be a model not only for uncharacterised 
impurities but also for very complex environments such 
as that inside a living cell. Section IIVI outlines the use 
of Bayes's theorem to estimate the nucleation rate from 



Nucleation is an activated process 0, H 13 • As such, 
its rate has an exponential dependence on the free-energy 
barrier to nucleation, AF* , the free-energy of the critical 
nucleus. The critical nucleus is, by definition, the nucleus 
at the top of the barrier to nucleation 1]. Thus, if at site 
i of the system, the free energy barrier is AF* , and the 
frequency of attempts at unfolding is Vi, then the rate of 
nucleation at the site i is 



R,^u,exp{~AF*). 



(1) 



We will assume that the attempt frequency Vi is only 
weakly dependent on i and so treat it as a constant: 
i>i — V. As AF* is exponentiated, if it varies appre- 
ciably then its variation dominates that of Vi which can 
then be neglected. We use units such that the thermal 
energy ksT — 1. If the system consists of Ng possible 
sites for nucleation then the average nucleation rate per 
site is simply 



Ns 



i=l 

= 7V7it.^exp(-Ai^*), 



(2) 
(3) 



Thus to calculate the nucleation rate we require the Ns 
values of the nucleation barrier at all possible nucleation 
sites. 

Often, the system of interest is complex, or poorly 
characterised with unknown impurities present. Then, 
we have little hope of determining all the Ns values of 
AF* . To deal with these situations we resort to a statis- 
tical approach: we guess the values of AF* . We do this 
by picking the AF* from a probability distribution func- 
tion that is characterised by two parameters, its mean 
m and standard deviation w. These two parameters can 
in turn be obtained from a model, estimated from ex- 
perimental data, or simply varied to see what qualitative 
behaviour is possible. We estimate them from a specific 
model in section mil 

It is convenient to express the AF* as a mean plus a 
deviation. 



AF* = 



(4) 



where 5i is a random variable with zero mean, it is the 
deviation of the nucleation barrier at site i from its mean 
value m. Taking the probability distribution of <5i, p(<5i). 



to be a Gaussian, we have 

exp [-5y{2w-')\ 



p{5^) 



(27rw2)i/2 



(5) 



Using Eq. Q for l^F* we can write Eq. (|2Jl as 



Ns 



R — Ng "^t/exp(— m) y^exp (— (5i) 



(6) 



i=i 



Now, with the ^F* independent random variables, the 
rate of Eq. ^ is, except for constant factors, equivalent 
to the partition function of the Random Energy Model 
(REM) of Derrida ^. The REM is a simple and weU 
understood model of glasses and other disordered sys- 
tems that undergo a transition to a state that is non- 
self-averaging. 

Just as in the REM the average partition function can 
be obtained, we can obtain the average, over realisations 
of the disorder, of the nucleation rate R, 

(i?) = 7V-ij.exp(-m)(^cxp(-<5,)) (7) 



= J/ exp (—771 + k; /2) 



(8) 



If the rate R is self-averaging then for almost all realisa- 
tions R will be close to (i?) and the right-hand side of 
Eq. © will be a good approximation to the nucleation 
rate of almost all realisations of our model of the surfaces 
inside a cell. But if the rate R is not self-averaging then 
Eq. (jSJ will not be a good approximation and the rate 
R will differ appreciably from one realisation to another. 
Nucleation in the presence of random static disorder was 
considered by Karpov and Oxtoby J}] who obtained re- 
sults similar to that of Eq. ||SJ), but they only considered 
self-averaging systems. 



A. Measures of non-self-averaging behaviour 

We will now look at how as the width of the distribu- 
tion of free-energy barriers, w, increases the behaviour 
ceases to be self-averaging. Firstly, we will look at how 
many nucleation sites contribute significant amounts to 
the nucleation rate in a typical realisation. If this num- 
ber is large then as the sites are assumed independent 
the rate is a sum of a large number of independent ran- 
dom variables and so will be self-averaging, whereas if it 
is small this will not be the case. 

From Eq. © we see that the rate R is dominated by 
sites with values of Ui where the product of the number 
of sites and exp(— (5^), is a maximum. The number of 
sites is simply proportional to the probability of Eq. jSJ. 
The maximum of the product p{5i) exp(— (5.;) is at a value 
oiS, 



Now, the average number of sites around this value of Si 
is just Nsp(Smax), and because this average is a sum over 
independent random variables (the Ui) the ratio of the 
fluctuations to the mean scales as [Nsp{Smax)]~^^'^- Thus 
the fluctuations in the number of sites that contribute the 
dominant amount to the rate, and hence the fluctuations 
in the rate itself are small relative to the mean if and 
only if Nsp(Smax) > 1- From Eqs. I© and © this is 
true whenever 21nA^s — w^ > q 

Thus, the boundary between self-averaging and non- 
self-averaging regimes is given by the equation 



21niV, -w^ = 0. 



(10) 



Thus the rate is self-averaging if and only if the loga- 
rithm of the number of possible sites for nucleation, is 
larger than half the variance of the nucleation barrier. 
This is the main result of this work. It is a very general 
result, i.e., it applies generally to activated processes in 
a random or near-random environment. Our conclusions 
here apply to any process with a rate given by an equa- 
tion of the form of Eq. ^. In the next section we will 
give the example of heterogeneous nucleation at a disor- 
dered surface and in Ref. 11], we showed that it held for 
a model of protein unfolding in vivo. 

In the non-self-averaging regime, a single unfolding site 
can be responsible for a significant fraction of the entire 
rate. This site must of course be the site with the lowest, 
i.e., most negative, value of S.^. We denote this lowest 
value by x. We can easily find an estimate for x, which 
we call (Set) • It is simply the value of 6 at which the mean 
number density, Nsp{Si), of sites drops below 1. This 
is easy to see: it cannot be much below the value of S 
for which Nsp{6) ss 1 as there are rarely any sites at all 
below this value and it cannot be much above it as for 
these values of S there are many sites. Thus, we have 
that Sev satisfies the equation Nsp{Sev) = 1, and so is 
given by 



(5e„ = -(21n7V,)'/'. 



(11) 



where to obtain this result we ignored the denominator 
of Eq. ©. 

So when a single site dominates the rate R, and has a 
value of Si close to Sev , the rate is approximately 



Rt ~ TV^Vexp -m + {2lnNs) 



\l/2. 



(12) 



-w 



(9) 



using Eq. ((TTJ in Eq. ©. Note that Rt -C {R) for large 
widths; (i?) increases as the exponential of w^, Eq. ||SJ), 
whereas Rt increases as only the exponential of w. Equa- 
tion ^ tells us that at, for example w — 6 the maximum 
contribution to the average rate, (R), comes from sites 
with values of S around Smax = 36. At these values of 
5 the probability density, Eq. (Q is close to 10~^. Thus 
even for Ng = 10® there is on average less than one site at 
values of S close to Smax- For Ng = 10® most realisations 
have no sites around Smax = 36, and so have values of 
R rather less than its mean value (R), and closer to Rt- 



The large value of (R) is due to a few realisations with 
very large values of R. 

Our analysis started with Eq. |^, the standard expres- 
sion for the rate of a barrier-crossing process. This is only 
valid if there is a barrier to cross, i.e., if m + Si is at least 
a few ksT. If there are sites present for which m + Si 
is close to zero, which is true if to — (21niVs)^/^u; < 
(Eq. (|ll|l 'l. then the nucleation rate at these sites will be 
essentially u. In this case we would expect these sites 
to dominate the nucleation rate as nuclei form effectively 
immediately at these sites. The rate will then be self- 
averaging if and only if the average number of these sites 
in a sample is much larger than one. In the remainder of 
the manuscript we will assume that to — (21n A^s)^/^w is 
at least a few fcsT. 

Also, Eq. (|12|) is for the rate when it is dominated by a 
single site. We would expect that often when nucleation 
has occurred at a site the growing domain of the nucle- 
ated phase will prevent the formation of further nuclei 
at this site. If this is so then once the first nucleus has 
formed then the rate R will decrease as then only the 
other sites with higher free-energy barriers to nucleation 
will remain. Thus associated with non-self-averaging nu- 
cleation rates we expect rates that are time dependent. 
When the rate R contains contributions from many sites, 
clearly the rate will only decrease after many nuclei have 
formed and so any time dependence will be much less no- 
ticeable. The rates R considered here are therefore initial 
rates. As determining the time dependence of rates re- 
quires study of the behaviour of nuclei after they have 
crossed the barrier we do not consider this time depen- 
dence here, although see Refs. 0, |g for post-nucleation 
growth in systems with distributions of nucleation barri- 
ers. 

We will now perform a quantitative analysis of the frac- 
tion of the rate due to the site with the lowest free-energy 
barrier, i.e., due to the one with Si = x. We calculate the 
average, fev: of the fraction of the rate due to the site 
with the lowest free-energy barrier. This can be calcu- 
lated from the probability distribution function, pe^{x), 
using 



/e 



I'exp (—TO,) 

Ns{R) 



Pet)(a;) exp (— x) dx. (13) 



We can simplify Eq. (|13|) by introducing the reduced vari- 
able y = x/w. Then, from Eq. p3|l and using Eq. ^ for 
(i?), we obtain 

lev = N^'^ exp (-w^/2) / dypeviy) exp {-wy) , (14) 

where Peviy) is the probability distribution function for 
the minimum value of a set of N^ values taken from a 
Gaussian of zero mean and unit standard deviation. Note 
that although the absolute value of the rate R and of the 
contribution of the extreme value both depend on the 
mean to, f^v does not. It depends only on w, and Ng. 

The determination of p™(y) is a standard problem in 
extreme- value statistics [Tjl . We start from the fact that 



FIG. 2; The mean fraction, fev, of the rate R that is due 
to the site with the lowest Si, as a, function of the width of 
the Gaussian, w. The solid, dashed and dotted curves are for 
Ns = 10*, 10* and 10^^ sites, respectively. 




the probability that the minimum of Ng values is y is the 
probability that 1 of the Ng sites has a value y, and all 
the remaining Ng ~ I sites have larger values, multiplied 
by Ng, as any one of the Ng sites can have the lowest 
value. Thus, 



Peviy) 



Ngp{y)p^''~ 



\y), 



(15) 



where p{y) is a normalised Gaussian of zero mean and 
unit standard deviation, andp>(?/) {p<{y)) is the proba- 
bility of obtaining a number larger (lower) than y from a 
Gaussian of zero mean and unit standard deviation. We 
are interested in the region where x is several standard 
deviations below the mean, y <^ —1. Now, p> = 1 — p<, 
and so as for y ^ — 1, p< <C 1, we can rewrite Eq. H15() 
as 



Peviy) - Ngp{y) exp [-Ngp<{y)] 



(16) 



where we replaced Ng — \ by Ng. Also, p<{y) 
(l/2)erfc(— 2//2^/^), which for j/ ^ — 1 simplifies to 



p<(y)^exp(-y2)/[(2^)V2(_y)] 



(17) 



In Fig.|21we have plotted the fraction of the rate due to 
the site with the lowest barrier, /e^, as a function of w. 
We took Ng = 10"*, 10* and 10^^. For protein crystalli- 
sation 14] distinct sites should be at least Inm apart. 
Then Ng — 10* sites corresponds to a surface of order 
100/^m^. The dependence on Ng is logarithmic, varying 
Ng by orders of magnitude does not have a marked effect. 
InA^s should nearly always be of order 10. We see that 
as w increases, so does fev For Ng = 10*, Eq. ((11])) is 
satisfied for w = 6.07. For w around this value the site 
with the largest interaction energy already contributes a 
large amount to the total rate, on average. This large 
contribution will vary significantly from one realisation 



to the next, and so the fraction of the rate due to the 
site with the lowest value of the nucleation barrier will 
vary substantially from realisation to realisation at large 
w. For some realisations it will be rather larger than /e„ 
and for others it will be much smaller. Whereas of course 
if w is small the rate R has significant contributions from 
many unfolding sites and so varies weakly from realisa- 
tion to realisation, essentially due to variations in the rate 
being averaged out in accordance with the central-limit 
theorem. 



B. Variable Ns 

There is data on the effect of impurities from the work 
of Turnbull [l^ and coworkers, and that of Perpezko 
|l(i| coworkers on nucleation from dispersions of liquid 
droplets [l|, Q • These experiments were motivated by the 
idea that if sufficiently small droplets could be formed 
some droplets would be free of all impurities and in those 
droplets the nucleation would then be homogeneous. It 
is not clear that this objective was achieved 0, IM H^- 
Perpezko [Ig assumed that the impurities are randomly 
distributed, and then the number of impurity particles in 
a droplet is given by a Poisson distribution function. He 
addressed the question of random variation in the num- 
ber of impurity particles but not that of variation in the 
interaction of the impurity with the nucleus. Thus in a 
sense it is complementary to this work. If we make the 
number of sites Ng itself a random variable but set w = 
then we obtain the model of Perpezko [ia| • Thus if we 
allow the number of nucleation sites N^ to be a random 
variable while maintaining w non-zero we have a model 
that can describe both variation in both the number of 
impurity particles and disorder in the surface of these 
particles. We leave such a generalisation to future work. 



III. DISORDERED SURFACES 

In the previous section we merely assumed that the 
presence of disorder introduced a random part 6i into 
the nucleation barrier at site i, and that the 6i are drawn 
from a Gaussian distribution. In this section we will start 
from a simple model of a disordered surface and show 
that in a certain limit, a Gaussian distribution of free- 
energy barriers is obtained, and obtain expressions for 
the mean m and width w, of this Gaussian, in terms of 
the parameters that characterise the surface. 

Surfaces, for example of impurities, can provide sites 
for nucleation. We consider a simple planar surface 
formed of a plane of sites of a cubic lattice all occupied 
by fixed monomers. The nucleus is taken to be a block 
of monomers of single type which may be the same type 
as some of those of the surface or different. We assume 
that not more than one monomer can occupy a site, thus 
the nucleus can be in contact with the surface and so in- 
teract with it but it cannot penetrate the surface. Apart 



from this excluded-volume interaction, the only interac- 
tions are those between monomers in contact. If the sur- 
face were uniform, i.e., composed exclusively of one type 
of monomer then the free energy barrier to nucleation 
would of course be the same at every point on the sur- 
face. However, if the surface is composed of 2 types of 
monomers that are not uniformly distributed then the 
barrier will vary from point to point, depending on the 
numbers of monomers of the different types that the nu- 
cleus is in contact with at a particular point. A schematic 
of a cubic nucleus in contact with such a surface is shown 
in Fig. n 

Let us call the 2 types of monomer A and B, and as- 
sume they are distributed at random. Let monomers of 
type A and B interact with the nucleus with energies £a 
and es , respectively. Then the shift in the barrier to nu- 
cleation when the nucleus is at a site i in contact with 
the surface is 



AF* = AFq* + meA + K - 'n^)eB, 



(18) 



where AFq is the nucleation barrier when the the nucleus 
is not in contact with the surface. Uc is the total number 
of sites in the nucleus that contact the surface; as the 
surface is taken to be planar this number is taken to be a 
constant, rii is the number of A monomers of the surface 
in contact with the nucleus when the nucleus is at site 
i. If the monomers of the surface are either A or B at 
random, then the probability of any one of the ric sites of 
the surface being an A-type monomer is just the fraction 
of A-type monomers, which we denote by /a- Then the 
probability of the nucleus being in contact with n^ A-type 
monomers and ric — n.t B-type monomers is just 



PAini) 



f7 77 Ja (1 ~ JA) 

ni\[nc - ni)] 

exp [-{ui - ms)'^/{2w^)] 



(2^2) 



2^1/2 



(19) 

(20) 



where the mean value m^ = fAnc, and the variance of 
the Gaussian w1 = UcfAi^ — /a)- 

Using Eqs. Q, |[TH|) and (^0)) we see that the Gaussian 
distribution for rii becomes a Gaussian distribution for 
Si of variance 

w'^ = wl {eA - es) 

= njAil-fA)ieA-^Bf- (21) 

The mean value of the AF* of Eq. igj is 

m = AF* + n.ifAeA + (1 - /A)eB). (22) 

For the nucleation rate to be non-self- averaging we re- 
quire that w"^ be larger than 2lnNs, Eq. (|1U|I . Unless 
Ns is extremely large or small 2 In Ng will be of order 10. 
From Eq. (|21|l we see that if the difference in interaction 
energy between the 2 types of monomer, ea — es is a few 
ksT, and if we have around ric = 10 sites of the surface 
in contact with the nucleus, then w'^ will be around 10 



to 30, providing that /a is neither very small nor close 
to unity. Thus, we predict that heterogeneous nucleation 
at disordered surfaces composed of significant fractions 
of different species whose interactions with the nucleus 
differ by a few fc^T, will often be dominated by one or a 
few sites. It will therefore vary appreciably between re- 
alisations. Experimentally, this means that the rate will 
differ appreciably between nominally identical samples. 

Finally, for the purposes of comparison we consider ad- 
sorption onto the surface of individual monomers. These 
monomers are of the same type as those that made up the 
nucleus. For simplicity we do so in the regime where we 
have much less than a monolayer, i.e., where the number 
of adsorbed monomers T ^ Ns- Now, we can compare 
the rate R with the adsorbed amount F in order to get 
a feel for which property is more likely to be non-self- 
averaging. When F <C Ns then few pairs of adjacent 
sites are occupied and so we can treat each surface site 
as being independent. Then F is given by 



i=l 



exp [^ + HjeA + (1 - ni)eB] 
+ exp [^ -I- n^eA + (1 - 11,^)63] ' 



(23) 



where n^ = 1 if the monomer at site i on the surface is an 
A-type monomer and n^ = if the monomer is a B-type 
monomer, jj, is the chemical potential of the monomers 
(in units of kBT). The variation of F from realisation to 
realisation will depend on tA, cb, Ia and /x. 

However, this variability simply comes from the fact 
that the terms in the sum of Eq. H23|l take one of two 
values depending on whether the monomer is type A 
or type B. These two values are bounded by zero and 
one. Thus we can easily obtain an upper bound for this 
variation in F by assuming the terms in the sum for F, 
Eq. ()23|l are either zero or one. This corresponds to, 
say, the A-typc monomers always having a monomer ad- 
sorbed onto them while the B-type monomers never have 
an adsorbed monomer. For definiteness we assume that 
A-type monomers are the ones with adsorbed monomers. 
This approximation will clearly overestimate the variabil- 
ity in F but even within this approximation the variance 
of F is just /a(1 — Ja)Ns for large Ng- The ratio of the 
standard deviation to the mean, /aNs, is then given by 



std. dcv. 



Ia 



1/2 



Tvr^/" 



(24) 



and so is small for large Ns and /a — 0(0. 1). At least 
when the adsorption is small F is self-averaging. So dis- 
order large enough to cause the rate R to be non-self- 
averaging may leave other properties, e.g., F, still self- 
averaging. As the nucleus is large, ric = 0(10), the vari- 
ance in the free-energy barrier at a site is large (it is 
multiplied by ric in Eq. (j23) and the rate R is then pro- 
portional to the exponential of this large quantity. Both 
the factor of ric and the exponentiation strongly enhance 
the effect of disorder and make the nucleation rate one 
of the most likely properties of a system to be non-self- 
averaging. 



IV. DETERMINING THE NUCLEATION RATE 
USING BAYESIAN INFERENCE 

In this section we will discuss the use of Bayesian infer- 
ence to determine the probable nucleation rate from mea- 
surements of nucleation, and hence determine whether or 
not two (or more) different samples have the same or dif- 
ferent nucleation rates. This is required as observing the 
effects of disorder on nucleation is hampered by the fact 
that nucleation is inherently a random process. There 
is more than one way to study nucleation and inference 
should be applicable to all of them, but for definiteness 
and because our nucleation rates R are initial nucleation 
rates we study determining the rate of nucleation from 
the time until the first nucleus appears. Fortunately, the 
inference problem we need to solve is the same as that 
given and solved as an example in chapter 3 of the text- 
book of MacKay 17] . We shall therefore give only a brief 
presentation, referring the reader for details to Ref. pJI . 

Nucleation is due to a fluctuation and so is random 
even in a completely uniform pure system. The time t at 
which the first nucleus appears is a random variable. The 
probability distribution function for t is an exponential. 



p{t) = RNs exp i-RNst) . 



(25) 



Experiments can also involve counting the number of 
events, and if these events are independent this number 
is given by a Poisson distribution function. For example 
Galkin and Vekilov [13, [13 count the number of protein 
crystals formed. The analysis here can also be applied to 
determine whether or not two Poisson distributions have 
different means. If they have then that too indicates a 
varying nucleation rate. 

Let us consider the situation where we have two sam- 
ples that have been prepared in the same way. If we 
can determine that they have different nucleation rates 
then clearly we must be in the non-self-averaging regime 
whereas if we examine a number of samples and they 
all have indistinguishable rates then we are in the self- 
averaging regime. A given sample will have some un- 
known total nucleation rate RNg. If we determine the 
time t at which a nucleus appears Na times, then we will 
have Na values, ii to t^^, drawn from the distribution 
of Eq. if^ . We denote this set of times by {t}. 

We now need Bayes's theorem, which is |12| 



P{RNs\{t}) = 



Po{RNs)p{{t}\RNs) 
Jpo{RNs)p{{t}\RNs)diRNs)' 



(26) 



where P {RNs\{t}) is the probability we want: it is the 
probability that the rate is RNg given the set of measured 
nucleation times {t}. Also, po (RNg) is the prior proba- 
bility distribution, the probability distribution before we 
made the measurements, and p {{t}\RNs) is the proba- 
bility of observing the set of nucleation times {t} given 
that the nucleation rate is RNg. This last probability 
is easily obtained from Eq. H25|) which gives the proba- 
bility of observing a single value of t given the rate. As 



the measurements are independent, p {{t}\RNs) is simply 
given by 

p{{t]\RNs) oc {RNsf^Iifj^e-K^{-RNsti) (27) 

^^-^"""^ Di\r+ \ (28) 



ex {RNsY'^e^^i-RNsts), 



where ts is the sum of the Na measurements 
is 



Na 



(29) 



The sign oc indicates that we have dropped a normahsa- 
tion constant. We can restore normahsation at the end 
of the calculation. 

Using Eq. (PH|l in Eq. ^^I^ we obtain the probabihty 
distribution function of the rate 

P{RNs\{t}) = cpo {RNs) {RNsf-^ exp {-RNsts) , (30) 
where c is just a constant of normalisation, 

c-^ = fpo (RNs) iRNsf-" exp {-RNsQ d (RNs) 

(31) 
We have considered a pair of randomly generated sys- 
tems. Each has Ng = 10^ sites with free-energy barriers 
taken from a distribution with mean m — 20 and stan- 
dard deviation w — 3. We generate two realisations, the 



first has a total nucleation rate RNs 



3.623 X 10" 



and the second has RNg = 1.575 x 10~^i/. To employ 
Bayesian inference we require a prior distribution for the 
total rate, po{RNs). We pick a top hat function, 



Po (RN,) 



R„ 




RN, 
RNs 



<Ro 
>i?o 



(32) 



Other reasonable priors give similar results, as they 
should. 

We have numerically generated sets of Na = 20 nu- 
cleation times for both systems and used both sets of 
values in Eq. (jSJ. The two resulting probability distri- 
bution functions, P{RNs\{t}), are plotted in Fig. |31 We 
used a prior of width i?o = 5 x 10~^i/. Even with such 
a broad prior, 20 measurements are clearly enough to 
demonstrate that it is very likely that the two systems 
have different nucleation rates. Thus, the use of Bayes's 
theorem in this way is an effective way of determining 
that the rate is varying from sample to sample, and so 
the rate is not self-averaging. 



V. CONCLUSION 

Nucleation often occurs with the nucleus interacting 
with, and with a free energy strongly reduced by, im- 
purities. This is called heterogeneous nucleation. Here, 
we have addressed the question: Under what conditions 
can chance variations from sample to sample in the im- 
purities present, cause the nucleation rate to vary signifi- 
cantly from sample to sample? In the previous section we 



FIG. 3: The probability distribution function P for the total 
nucleation rate RNs- It is obtained using Bayes's theorem 
applied to Na = 20 measurements of the time t at which the 
first nucleus appears. The true nucleation rates are RNs — 
3.623 X IQ-^iy (solid curve) and RN^ = 1.575 x 10"^;/ (dashed 
curve) . 
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showed how Bayes's theorem allows an efficient estima- 
tion of the nucleation rate in a sample and so allows vari- 
ations in this rate to be detected. As the impurities are 
typically uncharacterised and uncontrolled we resorted to 
a statistical theory to model chance, i.e., random, vari- 
ations in the impurities. The impurities were modelled 
by quenched disorder and we showed that the rate of nu- 
cleation has the same form as the partition function of 
Derrida's Random Energy Model [I3| ■ There is a regime 
where the nucleation rate in different samples prepared 
in the same way may be different, where it is non-self- 
averaging. This occurs when the width w of the distri- 
bution of nucleation barriers is large. The crossover from 
this regime to the regime where the nucleation rate is very 
similar in different samples occurs at a width w given by 
Eq. IjlOl) . The nucleation rate is very sensitive to disorder 
in the sense that it may be non-self-averaging even when 
other properties may still be self-averaging. This is in 
accord with experiment where nucleation is known to be 
highly sensitive to impurities jig . Our study of a specific 
model of nucleation at a disordered surface (section IIII|) 
showed that, at least within this model, the origin of this 
sensitivity lies in the fact that the nucleus is quite large, 
it consists of not one but many molecules, and that the 
rate is proportional to the exponential of the free-energy 
barrier. Nucleation is important in a number of fields, 
for example, it is crucial for protein crystallisation |l4| . 
The crystal phase of proteins is required for X-ray deter- 
mination of their structure. 

The method of direct observations of nucleation and 
applying Bayes's theorem, is not the only way of estimat- 
ing the effect of disorder on nucleation. An alternative 
way is to follow the fraction of the system that has un- 
dergone the phase transition as a function of time. The 
evolution over time r of this fraction, which we denote by 
X{t), is often described using the Kolmogorov- Johnson- 
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Mehl-Avrami (KJMA) theory 



according to which 



X{t) = 1 - cxp(-At™), 



(33) 



where ^ is a constant that depends on both the 
rate of nucleation and the rate of growth of the 
droplets/crystaUites of the new phase. Equation H33(l 
is sometimes referred to as Avrami's law. If the nucle- 
ation rate is uniform throughout the system, the expo- 
nent m = d + 1 with d the dimensionality of space. The 
power of d -|- 1 contains a power of d due to the fact that 
if the growth front of the domains of the new phase is 
moving at a constant velocity ■;;, then the volume of a do- 
main scales as (vt)'^. The additional power of time comes 
from the fact that for uniform nucleation the number of 
domains increases linearly with time t. However, if nu- 
cleation is not uniform but occurs at just a few sites then 
nucleation may occur at early times at these sites, and 
then nucleation ceases as the sites with low free-energy 
barriers have been 'used up'. Then the KJMA exponent 
m equals d not d + 1. The nucleation rates R calculated 



here are initial rates, when the rate R is dominated by a 
few sites it will decrease as they are 'used uj^. Thus, as 
has been discussed by Castro and coworkers 0, Q , disor- 
der can result in deviations from a simple KJMA growth 
law with exponent m = 3. See Refs. ^,^] for calculations 
showing effective exponents between 2 and 3. We would 
expect that non-self-averaging systems, where nucleation 
occurs predominantly at one or a few sites, should exhibit 
an exponent near to m = 2. It should be noted that they 
point out that m alone is a not a particularly discriminat- 
ing and that if the new phase forming is crystalline, then 
the grain size distribution provides more information. 

Finally, Harrowell and Oxtoby 4] have discussed the 
effects of the rapidly increasing relaxation time, essen- 
tially our h'~^, and heterogeneity present in a glass. 
Of course, glassy systems show non-self-averaging be- 
haviour. Future work could study non-self-averaging be- 
haviour of the nucleation rate in glasses. 

It is a pleasure to acknowledge that this work has ben- 
efited greatly from discussions with J. Cuesta. This work 
was supported by The Wellcome Trust (069242). 
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